[Executorch] Use temp allocator for allocating scratch memory #15728

kimishpatel · 2025-11-11T04:34:41Z

Stack from ghstack (oldest at bottom):

This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead.

Differential Revision: D85532076

This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/) [ghstack-poisoned]

pytorch-bot · 2025-11-11T04:34:45Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15728

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 5 Unrelated Failures

As of commit 1d96c89 with merge base 7600df8 ():

NEW FAILURE - The following job has failed:

pull / test-moshi-linux / linux-job (gh)
RuntimeError: Could not load libtorchcodec. Likely causes:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

Test CUDA Builds / export-model-cuda-artifact (openai, whisper-large-v3-turbo, non-quantized) / linux-job (gh) (matched linux rule in flaky-rules.json)
The process '/usr/bin/git' failed with exit code 128

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / linux / linux-job (gh) (trunk failure)
backends/xnnpack/test/recipes/test_xnnpack_recipes.py::TestXnnpackRecipes::test_int8_static_quant_recipe
pull / unittest / macos / macos-job (gh) (trunk failure)
backends/xnnpack/test/recipes/test_xnnpack_recipes.py::TestXnnpackRecipes::test_int8_static_quant_recipe
pull / unittest-editable / linux / linux-job (gh) (trunk failure)
backends/xnnpack/test/recipes/test_xnnpack_recipes.py::TestXnnpackRecipes::test_int8_static_quant_recipe
pull / unittest-editable / macos / macos-job (gh) (trunk failure)
backends/xnnpack/test/recipes/test_xnnpack_recipes.py::TestXnnpackRecipes::test_int8_static_quant_recipe

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2025-11-11T04:36:09Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

…ory" This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/) [ghstack-poisoned]

Copilot

Pull Request Overview

This PR refactors memory allocation in the Flash Attention implementation to use the temporary memory allocator from RuntimeContext instead of stack-allocated std::vector objects. This enables the use of caching allocators when available, reducing allocation overhead.

Adds RuntimeContext& ctx parameter to cpu_flash_attention function
Replaces stack-allocated vectors with ctx.allocate_temp() calls with fallback to heap allocation
Removes unnecessary buf_reduced allocation (dead code for unsupported reduced types)
Updates all call sites to pass the RuntimeContext

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`extension/llm/custom_ops/op_sdpa_impl.h`	Modified `cpu_flash_attention` signature to accept `RuntimeContext`, replaced vector allocations with temp allocator calls
`extension/llm/custom_ops/op_sdpa.cpp`	Updated all call sites (6 locations) to pass `ctx` parameter to `cpu_flash_attention`

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-17T16:18:29Z

extension/llm/custom_ops/op_sdpa_impl.h


 namespace sdpa::impl {

+static std::vector<char> scratch_for_quant_dequant_vec;


This static vector scratch_for_quant_dequant_vec is declared but never used in the code. It appears to be a leftover from the refactoring where the local vector was replaced with the temp allocator approach. This should be removed.

Suggested change

static std::vector<char> scratch_for_quant_dequant_vec;

kimishpatel requested review from jackzhxng, larryliu0820 and mergennachin as code owners November 11, 2025 04:34

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 11, 2025

meta-codesync bot added fb-exported meta-exported labels Nov 11, 2025

kimishpatel mentioned this pull request Nov 14, 2025

[Executorch] make slice_copy parallel #15830

Open

mergennachin requested a review from Copilot November 17, 2025 16:15

Copilot started reviewing on behalf of mergennachin November 17, 2025 16:15 View session

Copilot finished reviewing on behalf of mergennachin November 17, 2025 16:18

Copilot AI reviewed Nov 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Executorch] Use temp allocator for allocating scratch memory #15728

[Executorch] Use temp allocator for allocating scratch memory #15728

kimishpatel commented Nov 11, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Nov 11, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 11, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		namespace sdpa::impl {

		static std::vector<char> scratch_for_quant_dequant_vec;

[Executorch] Use temp allocator for allocating scratch memory #15728

Are you sure you want to change the base?

[Executorch] Use temp allocator for allocating scratch memory #15728

Conversation

kimishpatel commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15728

❌ 1 New Failure, 5 Unrelated Failures

Uh oh!

github-actions bot commented Nov 11, 2025

This PR needs a release notes: label

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kimishpatel commented Nov 11, 2025 •

edited

Loading

pytorch-bot bot commented Nov 11, 2025 •

edited

Loading

This PR needs a `release notes:` label